Search results for " Graphics Processing Unit."
showing 10 items of 17 documents
GPU accelerated Monte Carlo simulations of lattice spin models
2011
We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.
Towards an Efficient Implementation of an Accurate SPH Method
2020
A modified version of the Smoothed Particle Hydrodynamics (SPH) method is considered in order to overcome the loss of accuracy of the standard formulation. The summation of Gaussian kernel functions is employed, using the Improved Fast Gauss Transform (IFGT) to reduce the computational cost, while tuning the desired accuracy in the SPH method. This technique, coupled with an algorithmic design for exploiting the performance of Graphics Processing Units (GPUs), makes the method promising, as shown by numerical experiments.
CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing
2014
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using compute unified device architecture (CUDA)-enabled GPUs. Two critical GPU computing techniques, namely intertask hybrid CPU-GPU parallelism and tile-based Smith-Waterman map backtracking using CUDA, are investigated to facilitate fast alignments. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2, and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significant speedups over the multithreaded CUSHAW2, BWA-SW, Bowtie2, and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignment.
Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model
2010
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…
GSaaS: A Service to Cloudify and Schedule GPUs
2018
Cloud technology is an attractive infrastructure solution that provides customers with an almost unlimited on-demand computational capacity using a pay-per-use approach, and allows data centers to increase their energy and economic savings by adopting a virtualized resource sharing model. However, resources such as graphics processing units (GPUs), have not been fully adapted to this model. Although, general-purpose computing on graphics processing units (GPGPU) is becoming more and more popular, cloud providers lack of flexibility to manage accelerators, because of the extended use of peripheral component interconnect (PCI) passthrough techniques to attach GPUs to virtual machines (VMs). F…
CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations
2013
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…
GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View
2020
This paper presents a novel and efficient solution for the 3D sensor placement problem based on GPU programming and massive parallelisation. Compared to prior art using gradient-search and mixed-integer based approaches, the method presented in this paper returns optimal or good results in a fraction of the time compared to previous approaches. The presented method allows for redundancy, i.e. requiring selected sub-volumes to be covered by at least n sensors. The presented results are for 3D sensors which have a visible volume represented by cones, but the method can easily be extended to work with sensors having other range and field of view shapes, such as 2D cameras and lidars.
CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware
2011
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…
Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data
2013
Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…
Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects
2016
We present a real-time system capable of segmenting multiple 3D objects and tracking their pose using a single RGB camera, based on prior shape knowledge. The proposed method uses twist-coordinates for pose parametrization and a pixel-wise second-order optimization approach which lead to major improvements in terms of tracking robustness, especially in cases of fast motion and scale changes, compared to previous region-based approaches. Our implementation runs at about 50–100 Hz on a commodity laptop when tracking a single object without relying on GPGPU computations. We compare our method to the current state of the art in various experiments involving challenging motion sequences and diff…